Scalable coding of video sequences using tone mapping and different color gamuts

ABSTRACT

A Scalable Video Coding (SVC) process is provided for scalable video coding that takes into account color gamut primaries along with spatial resolution. The process provides for re-sampling using video color data obtained from an encoder or decoder process of a base layer (BL) in a multi-layer system to enable improved encoding and decoding in an enhancement layer (EL) or higher layers taking into account color conversion between layers. Examples of applicable SVC include MPEG-4 Advanced Video Coding (AVC) and High Efficiency Video Coding (HEVC). With the SVC process, video data expressed in one color gamut space can be used for prediction in encoding with a possibly different color space, and accommodation for different spatial resolution and bit-depth can be made as well.

CROSS REFERENCE TO RELATED APPLICATION

This application claims priority under 35 U.S.C. §119(e) from earlierfiled U.S. Provisional Application Ser. No. 61/955,773 filed on Mar. 19,2014 and incorporated herein by reference in its entirety.

BACKGROUND

1. Technical Field

The present invention relates to the process of using a two layerScalable Video Coding (SVC) scheme for encoding and decoding of videosequences derived from the same source with differences in resolution.More specifically, it relates to arranging, prediction andreconstruction of video data obtained from an encoder or decoder processduring scalable coding. Examples of scalable encoder or decoderprocesses include MPEG-4 Advanced Video Coding (AVC) and High EfficiencyVideo Coding (HEVC) that can be labeled Scalable HEVC (SHVC).

2. Related Art

An example of a scalable video coding system using two layers wherecolor tone mapping can be applied is shown in FIG. 1. In the system ofFIG. 1, one of the two layers is the Base Layer (BL) where a BL video isencoded in an Encoder E0, labeled 100, and decoded in a decoder D0,labeled 102, to produce a base layer video output BL out. The BL videois typically at a lower quality than the Enhancement Layer (EL) thatreceives an input y from the Full Resolution (FR) layer. The EL includesan encoder E1, labeled 104, for encoding the FR video, as well as adecoder D1, labeled 106. In encoding in encoder E1 104 of the fullresolution video, cross-layer (CL) information from the BL encoder 100is used to produce enhancement layer (EL) information. The correspondingEL bitstream of the full resolution layer is then decoded in decoder D1106 using the CL information from decoder D0 102 of the BL to outputfull resolution video, FR out. By using CL information in a scalablevideo coding system, the encoded information can be transmitted moreefficiently in the EL than if the FR was encoded independently withoutthe CL information. An example of SVC coding that can use two layersshown in FIG. 1 includes video coding using AVC and HEVC.

In spatial scalability, the BL is typically at a lower spatialresolution than Full Resolution (FR), as illustrated in FIG. 1 with adownsampling conversion process is applied from FR to BL. FIG. 1 showsblock 108 with a down-arrow r illustrating a resolution reduction fromthe FR to the BL to illustrate that the BL can be created by adownsampling of the FR layer data. Overall, the down arrow of block 108illustrates that for scalability, the base layer BL is typically at alower spatial resolution than the full resolution FR layer. It is worthnoting that the multilayer methods described apply when there are morethan two layers.

The CL information from the BL later can be used after upsampling toenhance the coding of the FR video in the EL. In the system of FIG. 1 incombination with an upsampler of FIG. 2, the CL information includespixel information derived from the encoding and decoding process of theBL. Because the BL pictures are at a different spatial resolution thanthe FR pictures, a BL picture needs to be upsampled (or re-sampled) backto the FR picture resolution in order to generate a suitable predictionfor the FR picture.

SUMMARY

Embodiments of the present invention provide systems for SVC thataccount for color gamut conversion between layers as well as spatialresolution conversion in some embodiments. The process provides forre-sampling using video color data obtained from an encoder or decoderprocess of a base layer (BL) in a multi-layer system to enable improvedencoding and decoding in an enhancement layer (EL) or higher layerstaking into account color conversion between layers. For example, withthe reconstructed data, video data expressed in a lower resolution inone color gamut space can be used to predict a higher resolution videoin another color gamut space and can also account for a differentbit-depth.

In one further embodiment, a different color mapping is applied todifferent regions of a video frame. The mapping to different frameregions can be done by at least one of the following procedures: (a)signaling linear or non-linear 3DLUT color mapping parameters with anadaptive quad-tree structure; (b) signaling mapping parameters in theslice or tile headers to create the same spatial freedom for correctingthe color tones; (c) signaling to reuse collocated partitioning andcolor mapping parameters from previous frames; and (d) using theadaptive quad-tree partitioning to adaptively signal filter parametersin the case that spatial scalability is also applied.

In a further embodiment, both color tone mapping from a base to a targetcolor gamut and spatial scaling are separately applied in an order whereone is applied first and then the other. In one embodiment when thecolor gamut scaling is applied for a tone mapping function in an encoderside, the tone mapping function occurs after spatial scaling. Then thereverse order of spatial scaling applied first and then tone mappingapplied occurs at a decoder side.

In a further embodiment, the tone mapping is applied as a functionmapping from a vector of three color values in one color gamut space toa corresponding vector of three color values in a different gamut space.The mapping can also map to values in the same color space. The mappingat the encoder is then applied on a three-color-component grid that isdifferent than a grid a Base Layer (BL) is on. In this embodiment, arelative location of luma and chroma samples in vertical and horizontaldimensions are signaled to a decoder to enable the decoder to adjust thesample locations to reverse the one used for the tone mapping in aforward direction.

BRIEF DESCRIPTION OF THE DRAWINGS

Further details of the present invention are explained with the help ofthe attached drawings in which:

FIG. 1 is a block diagram of components in a scalable video codingsystem with two layers;

FIG. 2 illustrates an upsampling process that can be used to convert thebase layer data to the full resolution layer data for FIG. 1;

FIG. 3 shows the downsampler of FIG. 1 that also allows for colormapping;

FIG. 4 shows the upsampler of FIG. 2 that also allows for color mapping;

FIG. 5 shows a block diagram of components for implementing theupsampling process of FIG. 4 according to embodiment of the presentinvention;

FIG. 6 shows a process for one embodiment of the present invention whereduring up-sampling spatial resolution conversion is applied first andthen the color mapping changes to the color gamut space are applied; and

FIG. 7 shows an alternative to FIG. 6, where the color mapping will bedone first and then spatial up-sampling takes place afterward.

DETAILED DESCRIPTION Color Tone Mapping Overview

To properly display a captured picture or video on different displays,in embodiments of the present invention a color mapping is applied tomap the video display content from one color space to another, or withina same color space. For this process, in its most common form, colortones from a set of primary color values in one layer for a pixel aremapped in the picture to a different set of color values for anotherlayer, referred to as a target color gamut. The mapping is applied sothat the color values for the second layer are suitable for presentationof the content on displays conforming to the target color gamut. Thesections below describe features of a Scalable Video Coding (SVC)process that provides for such color tone mapping.

I. Scalability Process Accounting for Color Gamut and Bit-depth

In some embodiments of the present invention, the same capture contentneeds to be displayed on different displays with a different color gamutspecification and possibly with a different bit-per sample and possiblywith a different resolution. The process of color mapping takes atriplet sample from one color gamut space and maps it to thecorresponding sample in the same spatial location of the other colorgamut space. This process can be non-linear and content or regiondependent. The process on downsampling that considers color and spatialconversion from the FR to the BL layer is illustrated in FIG. 3 that canbe applied to block 108 of FIG. 1.

The process of upsampling which involves different color gamut spacesand resolutions from BL to EL are shown in FIG. 4 that can be used inblock 200 of FIG. 2. The process of FIG. 4 can be applied at both theencoder and decoder side. For upsampling shown in FIG. 4, the data atresolution x is derived from the encoding and decoding process at theBL. A BL picture is processed by a combination of color mapping andupsampling in any known order to generate a y′ output as shown in FIG. 4that can be used as a basis for prediction of the original EL input y.

FIG. 5 shows a more detailed block diagram for implementing theupsampling process of FIG. 4 for embodiments of the present invention.The upsampling or re-sampling process can be determined to minimize anerror E (e.g. mean-squared error) between the upsampled data y′ and thefull resolution data y. The system of FIG. 5 includes a select inputsamples module 500 that samples an input video signal. The systemfurther includes a select filter and/or color mapping module 502 toselect a filter or map from the subsequent filter and/or re-map samplesmodule 504 to upsample the selected input samples from module 500.

In module 500, a set of input samples in a video signal x is firstselected. In general, the samples can be a two-dimensional subset ofsamples in x, and a two-dimensional filter or two dimensional mappingstructure can be applied to the samples, depending on the set of inputsamples. The module 502 receives the data samples in x from module 500and identifies an appropriate filter or map function in module 504 todirect the samples toward.

For the case where separate filters are used, a filter h[n; m] isapplied along the rows and columns to the selected samples to produce anoutput value of y′, or in this case y′[m] for each of the columns.Typically, this can be implemented with a set of M filters h, where forthe output value y′[m] at output index m, the filter h[n; m mod M] ischosen and is applied to the corresponding input samples x of the rows.The filters h[n; p] where p=m mod M generally correspond to filters withM different phase offsets, for example with phase offsets of p/M, wherep=0, 1, . . . , M−1. The total output of the filtering process using theselected filter h[n;m] on the selected input samples produces outputvalue y′.

In addition to filtering, a color mapping calculation may be applied toconvert to a different or the same color space. This mapping operationcan be performed to minimize an error cost. FIG. 5 shows that both theupsampling and color processing operations may be performed in the sameprediction process using either filtering or mapping for colorconversion.

The modules in FIG. 3, 4 or 5 can include one or more processors andmemory devices that enable the functions described to be accomplished.The memory is configured to store code that when executed by theprocessor causes the module to function as described to process videosignals. The memory can also store data to enable the functionsdescribed to be accomplished. In addition to the modules of FIGS. 3-5,other components of FIG. 1 can include such processor and memorycomponents.

II. Color Mapping Enhancements

The following sections describe further features that can be applied inembodiments of the present invention for SVC that better account forcolor and spatial conversion.

A. Signaling the Order of Spatial Scaling and Color Mapping

In cases that both spatial scaling and color gamut scaling are required,the order in which those processes are done at the encoder can bevaried. Since the down-sampling and tone mapping are usually highlynon-linear and non-reversible operations, it is proposed in someembodiments to signal the order by which the decoder should reconstructthe prediction for higher resolution at a different color gamut andspatial scaling.

An example of combined spatial and color gamut scalability is from 1080pBT.709 for BL to 4K BT.2020 for EL. In this case the encoder can havethe option of going from 4K BT.2020 to 4K BT.709 first and thendown-sample the 4K BT.709 to 1080p BT.709. In another example thedown-sampling takes place first to generate 1080p BT.2020 and then thecolor mapping takes place to create 1080p BT.709 from 1080p BT2020.

In a further embodiment, a flag in the bitstream would indicate, in anormative manner, the order in which BL reconstructed samples will beprocessed to generate the EL prediction samples. For example in one caseshown in FIG. 6, the up-sampling 600 would take place first and then thecolor mapping 602 occurs to change to the color gamut space. In anotherexample shown in FIG. 7, the color mapping 602 will be done first andthen spatial up-sampling 600 takes place. The decision on the order ofthese two processes should be indicated by the encoder in order tominimize some measure of error (e.g. distortion) or cost (e.g. acombination of rate and distortion).

If a video sequence should be processed to have a desired color gamut ata resolution different than the resolution and color gamut in which thevideo is captured, then it is proposed to apply the tone mappingfunctions, in the encoder side, after the spatial scaling takes place.One advantage of this proposed order is to create less interference withthe intended colors for the BL video. It is also expected that theproposed order will result in higher coding efficiency of the scalablecompression, since the reverse tone mapping (at the decoder) would bemore accurate if would be performed before up-sampling and thus avoidingthe distortion caused by spatial scaling.

B. Signaling Color Mapping Parameters Based on Content in Regions of aFrame

As color mapping is usually done to maintain the artistic intention ofthe scene, it is expected that different mappings can be applied to adifferent region of the frame. Therefore it is proposed to allow theencoder to signal different color mapping parameters for differentlocality in a given BL picture.

In one example this can be done by signaling linear or non-linear (e.g.by a 3 dimensional look up table (3DLUT)) color mapping parameters withan adaptive quad-tree structure. In another example color mappingparameters can be signaled in the slice or tile headers to create thesame spatial freedom for correcting the color tones. Due to similarartistic modifications in local content among consecutive frames, it ispossible to signal to re-use collocated partitioning and color mappingparameters from previous frames. In addition, the adaptive quad-treepartitioning can be used to adaptively signal filter parameters in thecase that spatial scalability is also applied.

C. Content Dependent Assignment of BL and EL to a Pair of Sequences WithDifferent Color Gamuts

If there are no other criteria, such as resolution or video qualitypreference, it is proposed to assign sequences with different colorgamuts to BL and EL, such that a cost or error can be minimized. Forexample, the scalable encoding of two 1080p sequences, one in BT.2020and the other in BT.709 color spaces can result in different overallbit-rate and average PSNR, if BT.709 be used as BL and BT.2020 is usedas EL vs. using BT.2020 as BL and BT.709 as EL.

D. Proposed Signaling of Chroma-Luma Alignment for Color Mapping

In most cases the tone mapping function is a mapping from a vector ofthree color values in one color gamut space to a corresponding vector ofthree color values in a different gamut space. There are cases where thecolor mapping at the encoder has been applied on a three-color-componentgrid which is different than the grid that the BL is on (e.g. due tocolor sub-sampling 4:4:4 vs 4:2:0 or spatial scalability). It isproposed that in these cases the relative location of luma and chromasamples (vertically and horizontally) should be signaled so the decodercan adjust the sample locations to reverse the one used for forward tonemapping.

Although the present invention has been described above withparticularity, this was merely to teach one of ordinary skill in the arthow to make and use the invention. Many additional modifications willfall within the scope of the invention as that scope is defined by thefollowing claims.

What is claimed:
 1. A method for scalable video coding comprising:receiving sampling signals from a video of the first coding layer andproviding an output signal to a second coding layer that codes videowith an enhanced resolution having a higher resolution than the baseresolution; selecting a picture from the input samples of the videosignal in the first coding layer for coding video with the baseresolution; selecting either a plurality of filters or a mapping formulathat converts a set of primary color values for a pixel in the picturewith a primary color gamut in the first coding layer to a different setof color values making up a target color gamut that is suitable forpresentation on a display used for the second coding layer that conformsto the target color gamut.
 2. The method of claim 1, wherein in additionto color gamut scaling from the primary color gamut to the target colorgamut, spatial scaling is provided.
 3. The method of claim 2, whereinthe spatial scaling and color gamut scaling are applied in order withone of the spatial scaling and color gamut being applied first, and thenthe other.
 4. The method of claim 3, wherein spatial scaling is applied,and wherein when the color gamut scaling is applied for a tone mappingfunction in an encoder side, the tone mapping function occurs afterspatial scaling.
 5. The method of claim 4, wherein a reverse order ofthe spatial scaling and the tone mapping are applied at a decoder side.6. The method of claim 2, wherein a flag is provided in a bitstream ofthe video, to indicate the order in the color gamut scaling and thespatial scaling are provided.
 7. The method of claim 1, wherein adifferent mapping would be applied to different regions of a frame forthe picture.
 8. The method of claim 7, wherein the different mapping inthe different frame regions is done by at least one of the following:(a) signaling linear or non-linear three dimensional look up table(3DLUT) color mapping parameters with an adaptive quad-tree structure;(b) signaling mapping parameters in the slice or tile headers to createthe same spatial freedom for correcting the color tones; (c) signalingto reuse collocated partitioning and color mapping parameters fromprevious frames; and (d) using an adaptive quad-tree partitioning toadaptively signal filter parameters in the case that spatial scalabilityis also applied.
 9. The method of claim 1, wherein the primary colorgamut and target color gamut are assigned different sequences for a BaseLayer (BL) and an Enhancement Layer (EL).
 10. The method of claim 1,when the color gamut scaling is applied for a tone mapping function,wherein the tone mapping is a function mapping from a vector of threecolor values in one color gamut space to a corresponding vector of threecolor values in a different gamut space, and wherein the mapping at theencoder is applied on a three-color-component grid that is differentthan a grid a Base Layer (BL) is on.
 11. The method of claim 10, whereina relative location of luma and chroma samples in vertical andhorizontal dimensions are signaled to a decoder to enable the decoder toadjust the sample locations to reverse the one used for the tone mappingin a forward direction.
 12. The method of claim 1, when the color gamutscaling is applied for a tone mapping function, wherein the tone mappingis a function mapping from one color gamut space to the same gamutspace.
 13. A system for scalable video coding comprising: a first codinglayer comprising modules for coding video with a base resolution; asecond coding layer comprising modules for coding video with an enhancedresolution having a higher resolution than the base resolution; anupsampling unit receiving sampling signals from the first coding layerand providing an output signal to the second coding layer after anupsampling process, wherein the upsampling unit output signal enablesmore efficient coding in the second coding layer, wherein the firstcoding layer modules comprise: a sampling module that provides samplingsignals of a video for the first coding layer; a picture selectionmodule that selects a picture from the input samples of the video signalfrom the sampling module; a color conversion module that selects eithera plurality of filters or a mapping formula for converting a set ofprimary color values for a pixel in the picture with a primary colorgamut in the first coding layer to a different set of color valuesmaking up a target color gamut that is suitable for presentation on adisplay used for the second coding layer that conforms to the targetcolor gamut for providing to the upsampling unit.
 14. The system ofclaim 13, wherein the first coding layer modules further comprise: aspatial scaling module that provides spatial scaling separate from thecolor conversion from the first coding layer for providing to the secondcoding layer.
 15. The system of claim 14, wherein when the color gamutscaling is applied for a tone mapping function in an encoder side, thetone mapping occurs after the spatial scaling, and wherein a reverseorder of the spatial scaling and the tone mapping are applied at adecoder side.
 16. The system of claim 14, wherein a flag is provided ina bitstream of the video to indicate the order in the color gamutscaling and the spatial scaling are provided.
 17. The system of claim13, wherein a different mapping would be applied to different regions ofa frame, for the picture, and wherein the different mapping in thedifferent frame regions is done by at least one of the following: (a)signaling linear or non-linear three dimensional look up table (3DLUT)color mapping parameters with an adaptive quad-tree structure; (b)signaling mapping parameters in the slice or tile headers to create thesame spatial freedom for correcting the color tones; (c) signaling toreuse collocated partitioning and color mapping parameters from previousframes; and (d) using an adaptive quad-tree partitioning to adaptivelysignal filter parameters in the case that spatial scalability is alsoapplied.
 18. The system of claim 13, wherein when the color gamutscaling is applied for a tone mapping function, wherein the tone mappingis a function mapping from a vector of three color values in one colorgamut space to a corresponding vector of three color values in adifferent gamut space, and wherein the mapping at the encoder is appliedon a three-color-component grid that is different than a grid a BaseLayer (BL) is on.
 19. The system of claim 18, wherein a relativelocation of luma and chroma samples in vertical and horizontaldimensions are signaled to a decoder to enable the decoder to adjust thesample locations to reverse the one used for the tone mapping in aforward direction.
 20. The system of claim 13, when the color gamutscaling is applied for a tone mapping function, wherein the tone mappingis a function mapping from one color gamut space to the same gamutspace.